relu dnn
Complexity of One-Dimensional ReLU DNNs
Kogan, Jonathan, Jananthan, Hayden, Kepner, Jeremy
Abstract--We study the expressivity of one-dimensional (1D) ReLU deep neural networks through the lens of their linear regions. We also propose a function-adaptive notion of sparsity that compares the expected regions used by the network to the minimal number needed to approximate a target within a fixed tolerance. Deep Neural Networks (DNNs) with Rectified Linear Unit (ReLU) activation functions are piecewise-linear functions whose expressive power can be studied via the number of linear regions that they create [1]-[3]. However, achieving such approximations for complicated functions typically demands substantial computational resources. The Lottery Ticket Hypothesis states that we can often remove many connections while maintaining similar performance, motivating the study of sparse DNNs [7].
Some Super-approximation Rates of ReLU Neural Networks for Korobov Functions
Introduction In recent years, Deep Neural Networks (DNNs) have emerged as a crucial technology in the fields of machine learning and artificial intelligence and achieved tremendous success. The remarkable success of DNNs has inspired numerous practical applications in computer science and engineering, including image recognition, natural language processing, autonomous driving, scientific computing, etc. Towards an explainable framework of machine learning, understanding the approximation power of DNNs is a fundamental question and remains an active research area.
Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions
This paper studies the approximation and generalization abilities of score-based neural network generative models (SGMs) in estimating an unknown distribution $P_0$ from $n$ i.i.d. observations in $d$ dimensions. Assuming merely that $P_0$ is $ฮฑ$-sub-Gaussian, we prove that for any time step $t \in [t_0, n^{O(1)}]$, where $t_0 \geq O(ฮฑ^2n^{-2/d}\log n)$, there exists a deep ReLU neural network with width $\leq O(\log^3n)$ and depth $\leq O(n^{3/d}\log_2n)$ that can approximate the scores with $\tilde{O}(n^{-1})$ mean square error and achieve a nearly optimal rate of $\tilde{O}(n^{-1}t_0^{-d/2})$ for score estimation, as measured by the score matching loss. Our framework is universal and can be used to establish convergence rates for SGMs under milder assumptions than previous work. For example, assuming further that the target density function $p_0$ lies in Sobolev or Besov classes, with an appropriately early stopping strategy, we demonstrate that neural network-based SGMs can attain nearly minimax convergence rates up to logarithmic factors. Our analysis removes several crucial assumptions, such as Lipschitz continuity of the score function or a strictly positive lower bound on the target density.
Review for NeurIPS paper: Rational neural networks
The paper studies rational DNNs --- deep neural networks where rational functions (of small degrees) are used as non-linearities. The paper provides many interesting theoretical results on the approximation properties of the rational DNNs (specifically, in comparison to ReLU DNNs). The paper also provides two experiments (learning the solution of the 2-dimensional PDE and applications in generative adversarial networks), which are meant to demonstrate that rational activations have advantages compared to other popular activations (ReLu, sine, tanh, polynomial, etc) when used in actual DNN training. The theory presented in the paper establishes that: (1) Consider two problems: (i) Approximating (in the uniform norm) a function implemented with the rational DNNs using ReLU DNNs; and (ii) approximating a function implemented with the ReLU DNNs using rational DNNs. Theorem 3 shows that (ii) is much easier than (i): (ii) can be solved to eps-precision with log(log(1 / eps)) many parameters, whereas (i) requires at least log(1 / eps) parameters (exponentially more).
From Monte Carlo to neural networks approximations of boundary value problems
Beznea, Lucian, Cimpean, Iulian, Lupascu-Stamate, Oana, Popescu, Ionel, Zarnescu, Arghir
In this paper we study probabilistic and neural network approximations for solutions to Poisson equation subject to H\" older data in general bounded domains of $\mathbb{R}^d$. We aim at two fundamental goals. The first, and the most important, we show that the solution to Poisson equation can be numerically approximated in the sup-norm by Monte Carlo methods, { and that this can be done highly efficiently if we use a modified version} of the walk on spheres algorithm { as an acceleration method. This provides estimates which are efficient with respect to the prescribed approximation error and with polynomial complexity in the dimension and the reciprocal of the error.} {A crucial feature is that} the overall number of samples does not not depend on the point at which the approximation is performed. As a second goal, we show that the obtained Monte Carlo solver renders { in a constructive way} ReLU deep neural network (DNN) solutions to Poisson problem, whose sizes depend at most polynomialy in the dimension $d$ and in the desired error. In fact we show that the random DNN provides with high probability a small approximation error and low polynomial complexity in the dimension.
On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation with Kolmogorov Superposition Theorem
This paper is devoted to studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation via the Kolmogorov Superposition Theorem. We first constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer. Subsequently, we demonstrate that this construction is optimal regarding the parameter count of the DNNs, achieved through investigating the shattering capacity of ReLU DNNs. Moreover, by invoking the Kolmogorov Superposition Theorem, we achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces.
Rethink Depth Separation with Intra-layer Links
Fan, Feng-Lei, Li, Ze-Yu, Xiong, Huan, Zeng, Tieyong
The depth separation theory is nowadays widely accepted as an effective explanation for the power of depth, which consists of two parts: i) there exists a function representable by a deep network; ii) such a function cannot be represented by a shallow network whose width is lower than a threshold. However, this theory is established for feedforward networks. Few studies, if not none, considered the depth separation theory in the context of shortcuts which are the most common network types in solving real-world problems. Here, we find that adding intra-layer links can modify the depth separation theory. First, we report that adding intra-layer links can greatly improve a network's representation capability through bound estimation, explicit construction, and functional space analysis. Then, we modify the depth separation theory by showing that a shallow network with intra-layer links does not need to go as wide as before to express some hard functions constructed by a deep network. Such functions include the renowned "sawtooth" functions. Moreover, the saving of width is up to linear. Our results supplement the existing depth separation theory by examining its limit in the shortcut domain. Also, the mechanism we identify can be translated into analyzing the expressivity of popular shortcut networks such as ResNet and DenseNet, \textit{e.g.}, residual connections empower a network to represent a sawtooth function efficiently.
Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
He, Juncai, Tsai, Richard, Ward, Rachel
The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evaluating the optimized network at points outside the training distribution. This paper considers the case in which the training data is distributed in a linear subspace of $\mathbb R^d$. We derive estimates on the variation of the learning function, defined by a neural network, in the direction transversal to the subspace. We study the potential regularization effects associated with the network's depth and noise in the codimension of the data manifold. We also present additional side effects in training due to the presence of noise.
ReLU Deep Neural Networks from the Hierarchical Basis Perspective
He, Juncai, Li, Lin, Xu, Jinchao
We study ReLU deep neural networks (DNNs) by investigating their connections with the hierarchical basis method in finite element methods. First, we show that the approximation schemes of ReLU DNNs for $x^2$ and $xy$ are composition versions of the hierarchical basis approximation for these two functions. Based on this fact, we obtain a geometric interpretation and systematic proof for the approximation result of ReLU DNNs for polynomials, which plays an important role in a series of recent exponential approximation results of ReLU DNNs. Through our investigation of connections between ReLU DNNs and the hierarchical basis approximation for $x^2$ and $xy$, we show that ReLU DNNs with this special structure can be applied only to approximate quadratic functions. Furthermore, we obtain a concise representation to explicitly reproduce any linear finite element function on a two-dimensional uniform mesh by using ReLU DNNs with only two hidden layers.